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MATCH-BOUNDED STRING REWRITING SYSTEMS 


Alfons Gesert, Dieter Hofbauerq and Johannes Waldmanm 


ABSTRACT 

We introduce a new class of automated proof methods for the termination of rewriting 
systems on strings. The basis of all these methods is to show that rewriting preserves 
regular languages. To this end, letters are annotated with natural numbers, called 
match heights. If the minimal height of all positions in a redex is h then every position 
in the reduct will get height h + 1. In a match-bounded, system, match heights are 
globally bounded. Using recent results on deleting systems, we prove that rewriting by 
a match-bounded system preserves regular languages. Hence it is decidable whether a 
given rewriting system has a given match bound. We also provide a sufficient criterion 
for the absence of a match-bound. The problem of existence of a match-bound is 
still open. Match-boundedness for all strings can be used as an automated criterion 
for termination, for match-bounded systems are terminating. This criterion can be 
strengthened by requiring match-boundedness only for a restricted set of strings, for 
instance the set of right hand sides of forward closures. 

1 INTRODUCTION 

Rewriting is a model of computation. It allows to handle questions like termination (there is 
no infinite computation), normalization (a final configuration is reachable) and correctness 
(no erroneous configuration is reachable). These questions can be stated in terms of sets 
of descendants: if R is a rewriting system, and L is a language, then R*(L ) = {y \ x G 
L,x — y}. Now R is correct for L iff R*(L) fl Err = 0, and R is normalizing for L iff 
L C R~* (Final), with Err and Final denoting the set of erroneous and final configurations, 
respectively. Starting from classical program analysis, recent applications include verification 
of XML transformations [3] and cryptographic protocols [10]. 

From the point of view of these applications, the reachability relation R* should effectively 
respect language classes with good decidability and closure properties — like the class of 
regular languages. Some of us recently showed [17] that deleting string rewriting systems 
respect regular languages. In the present paper, we transfer this result to match-bounded 
string rewriting. 

Every match-bounded system terminates, and effectively preserves regularity of lan- 
guages. Therefore it is decidable whether a given system has a given match-bound. This 
makes match-boundedness a new automatic criterion for termination. The criterion applies 
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for instance to Zantema’s System { a 2 b 2 — > b 3 a 3 } (match-bound 4) for which hitherto all 
automated termination proof methods failed. 

A string rewriting system R is called deleting if there exists a partial ordering on its 
alphabet such that each letter in the right hand side of a rule is less than some letter in the 
corresponding left hand side. Deleting systems can be understood as the inverses of context 
limited grammars as defined and investigated by Hibbard [16]. Deleting rewriting systems 
terminate and have linearly bounded derivational complexity. 

To obtain automated termination proofs, we transform rewriting systems as follows: 
We annotate letters with numbers, which we call match heights. A position in a reduct 
will get height h + 1 if the minimal height of all positions in the redex is h. A rewriting 
system is match-bounded if match heights of derivations are globally bounded. In this case 
its annotated system is finite and deleting. Termination and regularity preservation carry 
over from the annotated to the original system. The recognizing automaton for the set of 
descendants modulo the annotated system is a certificate for match-boundedness. 

We study also RFC-match-boundedness , a variant of the criterion, where a system has to 
be match-bounded only for the set of right hand sides of its forward closures. By a result of 
Dershowitz, termination there is sufficient for uniform termination. 

Basic definitions, results and examples are given in Sections 3 and 4, while in Section 5 
we discuss how to verify or refute match-boundedness. In Section 6 we introduce RFC- 
match-boundedness, and consider some variants of this notion in Section 7. All main criteria 
are implemented (Section 8). Section 9 contains a short comparison of our new termination 
criteria with Zantema’s Termination Hierarchy. We conclude by discussing ramifications for 
further research in Section 10. 

Some of the results reported here have been presented at the 28th International Sympo- 
sium on Mathematical Foundations of Computer Science MFCS 2003 at Bratislava, Slovak 
Republic [12] and at the 6th International Workshop on Termination WST 2003 at Valencia, 
Spain [13]. 

2 PRELIMINARIES 

We mostly stick to standard notations for strings and string rewriting, as e.g. in [2], We use 
e for the empty strmg, and |x| is the length of a string x. Let REG denote the class of regular 
languages. Further, for a language ICE*, let factor (L) = {y G E* | =h, z G E* : xyz G L}. 

A string rewriting system over an alphabet E is a relation R C E* x E*, inducing the 
rewrite relation ^r = {(x£y,xry) \ x,y G E*, (£,r) G i?} on E*. Unless indicated otherwise, 
all rewriting systems are finite. Pairs (£, r) from R are frequently referred to as rules l — > r. 
By lhs(-R) and rhs(i?) we denote the sets of left (resp. right) hand sides of R. The reflexive and 
transitive closure of — is — >* R , often abbreviated as R*, and — or R + denote the transitive 
closure. An R- derivation is a (finite or infinite) sequence (a;o,aq, • • • ) with aq — >r x i+ i for 
all i. We call R terminating on i C E* if there is no infinite derivation starting with some 
xo G L. If L — E*, we call R terminating. In order to classify lengths of derivations, define 
the derivation height function modulo R on E* by dliR(a;) = maxjn G N | 3y G E* : x — ^ y}. 
The derivational complexity of R is defined as the function n i— > max{dh R (a:) j |a;| < n} on 
N. 

A rewriting rule i — > r is context-free if \l\ < 1, and a rewriting system is context-free if 
all its rules are. 
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For a relation p C A x B let p(a) = {b E B \ (a, b) E p } for a E A and p(A') = 1J aeA , p(a) 
for A' C A. The inverse of p is p~ = {( 6 , a) \ (a,b) E p} C B x A, and we say that p satisfies 
the property inverse P if p~ satisfies P. Thus, the set of descendants of a language L C E* 
modulo some rewriting system R is A* (A). The system R is said to preserve regularity 
( context-freeness ) if A* (A) is a regular (context-free) language whenever L is. 

For a relation p C E' x E* and a set A C E, let p\/\ denote p fl (A* x A*). Note the 
difference between A*|a and (A|a)* f° r a string rewriting system A. E.g., for R — {a — > 
b, b — > c} over E = {a, b, c} and A = {a, c} we have (a, c) E A*|a, but (a, c) ^ (A|a)*- 

A relation s C E* x T* is a substitution if s(e) = {e} and s(xy) = s(x)s(y ) for x,y E E*. 
So a substitution s is uniquely determined by the languages s(a) for a E S. If each language 
s(o) for a E E is finite, then s is a finite substitution. 

Now we recall definitions and results regarding deleting string rewriting systems [17], a 
topic that goes back to Hibbard [16]. A string rewriting system R over an alphabet E is 
>-deleting for an irreflexive partial ordering > on E (a precedence ) if e ^ lhs(A), and if for 
each rule £ r in R and for each letter a in r, there is some letter b in £ with b > a. The 
system R is deleting if it is >-deleting for some precedence >. 

Proposition 1 ([17]). Every deleting string rewriting system is terminating, and has linear 
derivational complexity. 

Furthermore, we have the following decomposition result. 

Theorem 1 ([17]). Let R be a deleting string rewriting system over E. Then there are an 
extended alphabet TDE, a finite substitution s C E* xT* , and a context-free string rewriting 
system C over T such that R* = (s o C _ *)|s. 

As a consequence, inverse deleting systems effectively preserve context-freeness, a result 
by Hibbard [16]. As another consequence we get: 

Corollary 1 ([17]). Every deleting string rewriting system effectively preserves regularity. 

3 MATCH-BOUNDED STRING REWRITING SYSTEMS 

We will now apply the theory of deleting systems to obtain results for match-bounded rewrit- 
ing. A derivation is match-bounded if dependencies between rule applications are limited. 
To make this precise, we will annotate positions in strings by natural numbers that indicate 
their match height. Positions in a reduct will get height h + 1 if the minimal height of all 
positions in the corresponding redex was h. 

Given an alphabet E, define the morphisms lift c : E* — > (E x N)* for c £ N by lift c : 
a (a, c), base : (E x N)* — » E* by base : (a, c) i— ■> a, and height : (E x N)* — ■> N* by 
height : (a, c) i— >• c. For a string rewriting system R over E such that e f lhs(R), we define 
the rewriting system 

match (A) = {£' — » lift c (r) j {£ — » r) E R, base(£') — £,c— 1 + min (height (£'))} 

over alphabet E x N. For instance, the system match ({ah — > be}) contains the rules ao&o — > 
b\C\, ao&i — > b\Ci, aibo — > 61 C 1 , a i b i — > 6 2 C 2 , «o&2 — > biC\, . . . , writing x c as abbreviation 

for (x,c). For non-empty R , the system match (A) is always infinite. Note that systems 
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with e G lhs(A) are trivially non-terminating, so the above restriction does not exclude any 
interesting cases. 

Every derivation modulo match (A) corresponds to a derivation modulo A, (for x, y G 
(E x N)*, if x — ^ match (n) y then base(x) — base(y)) and vice versa (for v,w G E* and 
iG (Ex N)*, if v w and base(x) = v, then there is y G (E x N)* such that base(y) = w 
and x — >match(j?) y)- In particular, for n G N we have R n = lift 0 o match ( A) n o base; thus 

A* = lifto o match (A)* o base . 

Definition 1. A string rewriting system R over E is called match-bounded for L C E* by 
c G N if e ^ lhs(A) and max(height(x)) < c for every x G match(A)*(lifto(L)). If we omit L, 
then it is understood that L = E*. 

Note that max(height(x)) (and min (height (^)) in the definition of match (A)) denotes 
the maximum (minimum, respectively) over the corresponding sequences of heights; we set 
max(e) = 0, and we leave min(e) undefined as this case is excluded in the definition of 
match (A). Obviously, a system that is match-bounded for L is also match-bounded for any 
subset of L by the same bound. Further, if R is match-bounded for L then R is match- 
bounded for R*(L), again by the same bound. 

For a match-bounded system R , the infinite system match (A) may be replaced by a 
finite restriction. Denote by match c (i?) the restriction of match (A) to the alphabet E x 
{0, 1, • • • , c}. 

Lemma 1. If R is match-bounded for L by c, then R h \l = (lifto ° match c (i?) n o base)|^ for 
n G N, thus 

R*\l = (lift 0 ° match c (/2)* obase)|z,. 

Lemma 2. For all R with e f lhs(i?) and all cgN, the system match c ( R) is deleting. 

Proof. Use the precedence > on E x {0,...,c} where (a, m) > ( b , n) iff m < n. (Letters of 
minimal match height are maximal in the precedence.) □ 

Theorem 2. If R is match-bounded for L, then R is terminating on L. 

Proof. An infinite A-derivation starting from an element of L can be transformed into an 
infinite match (A) -derivation from an element of lifto (L). The latter, given that R is match- 
bounded by c, is a match c ( A) -derivation. However, match c (A) is deleting by Lemma 2 and 
hence terminating by Proposition 1. □ 

Likewise, Lemma 1 implies linearly bounded derivation lengths for match-bounded sys- 
tems. 

Proposition 2. Every match-bounded string rewriting system has linear derivational com- 
plexity. 

We conclude this section with a few examples. 

Example 1. The system {ab — > be} is match-bounded by 1, {aa — > aba} is match-bounded 
by 2, {ab — > ac, ca — > be} is match-bounded by 2, and {ab — > ac, ca — > 6} is match-bounded 
by 3. 
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All these bounds can be verified automatically, as will be explained in Section 5. The 
next example illustrates that indeed any number can be a least match bound. 

Example 2. The bubble sort system B 2 = {ab — > ba } over the two-letter alphabet {a, b} is 
match-bounded for a*b n by n, but not by n — 1. The system {a* — > a i+ i | 0 < i < n} over 
alphabet E = {a* ] 0 < i < n} is match-bounded (for E*) by n, but not by n — 1. As a 
variant of the previous example, now over a fixed alphabet, consider the system {ab l c — > 
ab t+1 c | 0 < i < n} over {a, b, c}; it is match-bounded by n, but not by n — 1. The same 
holds true for the length-preserving variant {ab l c n ~ l+1 — > ab t+1 c n ~ l | 0 < i < n}. 

Example 3. System B 2 is not match-bounded (for {a, 6}*) since it has quadratic derivational 
complexity, contradicting the conclusion of Proposition 2. 

Dually to Lemma 2, we have: 

Proposition 3. If R is deleting, then R is match-bounded. 

Proof. Assume R over E is deleting for the precedence > on E. Then R is match-bounded 

by the maximal height (i.e., length of a descending chain) in (E, >). □ 

Example 4. The system {ba — > cb, bd — > d,cd — > de} is match-bounded by 2, since it is 

deleting for the precedence a > b > d, a > c > e, c > d. 

4 MATCH-BOUNDED SYSTEMS PRESERVE REGULARITY 

Here, we elaborate on the fact that match-bounded string rewriting systems always preserve 
regularity. The section concludes on a short comparison of match-boundedness to the related 
concept of change-boundedness [25]. 

Theorem 3. If R is match-bounded for L € REG, then R*(L ) e REG. 

Proof. By Lemma 1, R*(L) = base(match c (i?)*(lift 0 (L))) for some c G N. As match c (i?) is 
deleting by Lemma 2, thus regularity preserving by Corollary 1, and since REG is closed 
under morphisms, we are done. □ 

Example 5. For R = {aaba — > abaab} (cf. [19], p. 118) and L = ( aab )*, the language 
match ( A) * ( lif t o ( L ) ) is accepted by the following automaton. We use generalized automata 
where transitions are labelled by words instead of single letters. 



By stripping heights from all letters, one obtains an automaton accepting R*(L). 

Example 6. The bubble sort system B 2 = {ab — > ba} is not regularity preserving, since 
B 2 ((ab)*) D a*b* = {a n b n \ n > 0} is not regular. So Theorem 3 implies that B 2 is not 
match-bounded. (Cf. Example 3 for another indirect proof, and Example 10 for a direct 
proof of the same fact.) 
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However, not every regularity preserving string rewriting system is match-bounded. For 
instance, the system {aa — > a} constitutes a counterexample. As a monadic system (i.e., 
\£\ > |r| <1 for (i — > r) G A) it preserves regularity [1, 2], but it is not match-bounded as 
proven in Example 12. 

Remark 1. There are terminating and regularity preserving systems with high derivational 
complexity as we are going to demonstrate. 

For an alphabet E, define the string rewriting system Embed(E) = {a — > e j a G E}. 
By an application of Kruskal’s Theorem, the subword language Embed(E)*(L) is regular for 
each language L over E, cf. Theorem 7.3 in [4], This implies that for any rewriting system 
R over E, the system R U Embed(E) preserves (in fact, generates) regularity. 

Termination of R U Embed(E) is called simple termination of R. By the above, ev- 
ery simply terminating rewriting system R can be extended to a (simply) terminating and 
regularity preserving system while keeping or increasing its derivational complexity. E.g., 
{ab — > ba, a — > e, b — > e} preserves regularity, and has quadratic complexity. 

Example 7. Peg solitaire is a one-person game. The objective is to remove pegs from a board. 
A move consists of one peg X hopping over an adjacent peg Y, landing on the empty space 
on the opposite side of Y. After the hop, Y is removed. Peg solitaire on a one-dimensional 
board corresponds to the string rewriting system 

P = {■■□ ->• -> 

where ■ stands for “peg”, and □ for “empty”. One is interested in the language of all 
positions that can be reduced to one single peg, which is P~* (□*■□*). Regularity of 
P~* (□*■□*) is a “folklore theorem”, see [24] for its history. The system P~ is match- 
bounded by 2, so we obtain yet another proof of that result. 

Remark 2. Ravikumar [25] proves that P preserves regularity by considering the system’s 
change-bound (which is 4). Change-boundedness is similar to match-boundedness. Given a 
length-preserving string rewriting system R (viz. \l\ = |r| for every rule £ — > r), define the 
system 


change (A) = {£ —> r \ (base(f') — ► base(r)) G A, height (succ(h)) = height (r)} 

over alphabet ExN, where succ is the morphism succ : (E x N)* — » (E x N)* induced 
by succ : (a,h) h- > ( a,h + 1). For instance, the system change ({ah — > be}) contains the 
rules aobo —■ > Wei, a^bi — > b\C 2 , a,ib 0 — > ► 62C1, a 1 b 1 — > 62C2, ao& 2 — > 61C3, .... Ravikumar 
proves that if change(A)*(lift 0 (L)) has bounded height, then A preserves regularity of L. 
In contrast to change-bounds, match-bounds are also applicable to non-length-preserving 
systems. For length-preserving systems, match(A) will always give lower or equal heights, 
so our result directly implies Ravikumar’s. In fact, it can also be shown conversely that 
match-boundedness implies change-boundedness for length-preserving systems. 

5 VERIFICATION AND REFUTATION OF MATCH-BOUNDS 

In this section, we show that match-boundedness by a given bound is decidable. Further, 
we provide a sufficient condition for the absence of a match bound. We leave decidability of 
match-boundedness as an open problem. 


6 



Theorem 4. The following problem is decidable: 

Given: A string rewriting system R, a regular language L, and c G N. 

Question: Is R match-bounded for L by c? 

Proof. Construct a finite automaton for L c+ 1 = match c+ i(.R)*(lifto(T)), using Theorem 3. 
Then R is match-bounded for L by c iff max(height(L c+1 )) < c. □ 

Any given automaton over alphabet E xl can be seen as a potential certificate of the 
fact that R is match-bounded for L by c, and hence of termination of R on L. The certificate 
is valid if the accepted language 

1. includes lift 0 (L), 

2. is closed under rewriting modulo match c+ i(i?), and 

3. contains no letter of height c + 1. 


The first two items imply that match c+ i(i?)*(lifto(T)) is included in the accepted language. 
Validity of such a certificate can be decided by standard algorithms for finite automata. 

Example 8. For R = {aa — > aba} and L = {a, b}*, the set match(i?)*(lift 0 (L)) is accepted by 
the following (non-deternrinistic) automaton. (Again, we use an obvious generalized notation 
where transitions are labelled by sets of words.) 



Closure under match (i?) can be verified by checking off the table on the right. Since the 
highest label is 2, the automaton certifies that R is match-bounded by 2, as claimed in the 
introductory Example 1. 

For an implementation, the growth of |match c (i?)| as a function of c is problematic. 
However, when computing match c (i?)*(lifto(T)), we may restrict attention to those rules of 
match c (i?) that are accessible in derivations starting from lifto(T). For a language L C £*, 
a system R over £, and a system S C match(i?) define 


accessible(L, R, S) = match(i?) n (factor (S* (lift 0 (L))) x (£ x N)*). 


Note that this construction is effective if a finite system S and a regular language L are 
effectively given. We construct a sequence of rewriting systems R, % by R 0 = 0 and R i+ 1 = 
accessibl e(L, R, Rf). Induction on % shows R, t C match,; ( R) for i > 0. In particular, every 
system Ri is finite. By induction on i, using that S C S' implies accessibl e(L,R,S) C 
accessibl e(L, R, S'), one also proves that Rj C R i+ 1 . Define R ^ Clearly, 

i^ o (lift 0 (L)) = match(i?)*(lifto(T)). If R is match-bounded for L by c, then R ^ is a subset 
of match c (i?); so R ^ is finite, and there is an index N such that Rn = Rn+i — ■ ■ ■ ■ If R is 
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not match-bounded for L, then contains for each c a rule with height c, and therefore 
is infinite. We remark that the enumeration of Rj up to i = | match c ( R) | + 1 can be used as 
an alternative decision procedure for Theorem 4. 

Example 9. Proving termination of the one-rule system Z = { a 2 b 2 — > b 3 a 3 } is known as Zan- 
tema's Problem. This is a “modern classic” in rewriting [5, 8, 19, 27, 28, 32], as it provides a 
test case where all previous automated methods for termination proofs fail. Our algorithm 
constructs in 6 iterations a deterministic automaton with 85 states. This automaton rec- 
ognizes match(Z)*(lifto(£*)) and certifies that Z is match-bounded for E* by 4. This also 
proves that Z has only linear derivational complexity, a result by Tahhan-Bittar [28]. 

Sometimes we can also verify automatically that a given rewriting system R (e ^ lhs(i?)) 
is not match-bounded for a language L. For this purpose, we want a non-empty witnessing 
language W C L such that every element in W can be reached from some element in W by 
an all-height increasing derivation. By chaining such derivations, strings of arbitrary height 
can be derived, disproving match-boundedness. In the remainder of this section we formalize 
this argument. 

For u, v G (E x N)* we write u > v if base(w) = base(u) and height (it) > n height (v), 
where > n denotes the pointwise greater-or-equal ordering on N n . We assume W C E + . A 
string y G W is reached from i G lb if there is a derivation lift 0 (x) — 1 Vatch^) w' 1 f° r some 
string y' > lifti ( 2 /) and strings p,q. Now every element in W is reached from an element in 
IbiflbC raised(i?, W), where the latter set of strings is defined by 

raised (R, W ) = base(factor(match(i?)*(lift 0 (kF))) fl (E x (N \ (0})) + ). 

First we observe that a match(A)-derivation can always be raised to greater heights since 
the two relations > and — > ma tch(i?) commute: 

Lemma 3. ^ O ^match(i?) — ^match(R) 0 A • 

Proposition 4. Let R be a string rewriting system such that e £ lhs(-R), and let W be a 
non-empty language, both over E. If W C raised(i?, W), then R is not match-bounded for 
W. 

Proof. We prove a stronger claim: If W C raised(i?, W) then, for every c > 0, 

W C base(factor(match(i?)*(lifto(kF))) fl (E x {c, c + 1, . . . }) + ). 

In other words, every element of W can receive unbounded match heights. We prove this 
claim by induction on c. Consider y G W . For c = 0 we obtain y G base(factor(lift 0 (|/)) fl 
(E x N) + ). So assume c > 0. By inductive hypothesis there is a string u G W and a 
derivation 


lifto(w) hnatch(-R) LV Q 

with 1 / > lift c _i ( 2 /) . Since this derivation can be relabelled to a derivation 


lifti (u) 


match (R) succ(p) SUCC (j/) SUCc(g) 



with succ (y 1 ) > succ(lift c _i(y)) = lift c (y), where succ is the morphism defined in Section 4, 
increasing the height of each position by 1. Since u G W C raised(-R, W), there is v G W 
and a derivation 

lift 0 (v) ^Ltch(R) AV (!) 

with u' > lift! (ii) . By Lemma 3 we get a derivation 

^ ^match(R) P V Q (2) 

for some y" > succ(i/). We conclude by composing (1) and (2) into a derivation 

llfto(v) ^ match (R) pu q * match ( 71) PP V Q Q 

with y" > lift c (y). □ 

A slightly weaker version of Proposition 4 is obtained as follows: Define 

raised c (i?, W) = base (factor (match c (A)* (lift 0 (kP))) fl(Ex(N \ {0}) ) + ) , 
and replace raised (R, W) in Proposition 4 by raised c (i?, W): 

Corollary 2. Let R be a string rewriting system such that e ^ lhs ( R) , and let W be a 
non-empty language, both over E. If W C raised c (i?, W) for some c G N, then R is not 
match-bounded for W. 

This version can be effectively checked if a finite system R , a number c E N, and a regular 
language W are effectively given. 

Example 10. The system B -2 = {ab — > ba} (cf. Example 6) is not match-bounded for E*. 
Take W = ( ab) + . Then raisedi(S 2 , W) = faetor((6a) + ) D W. 

Example 11. Neither is R = {aabb — > ba} match-bounded, as witnessed by W = {a, 6} + = 
raisedi(i?, W). This can be seen as follows. Define the two morphisms <f : a i— >• a,b i— »• abb 
and if : a i— »• aab, b b. Then, for each y e E*, there are derivations 

a 4>{y) V 0 and if(y) b^* R by, 

and these can be combined to a derivation 

a abb = a (f{if(y) b ) -> * R if(y) ba^ R by a. 

When lifting this to a match ( Pj-derivation , starting from heights 0, all final heights are 1. 
This proves that for each y G W = E + , there is x = a<f)(ip(y))abb G E + = W with the 
required property. In contrast, the system {a 3 b 3 — > b 2 a 2 } is match-bounded by 2. 

Example 12. The regularity preserving system R = {aa —> a } is not match-bounded: check 
that W = {a 2 " j n 6 N} C raisedi(i?, W). Alternatively, W' = a + C raisedi(i?, W'). 

Example 13. The system R = {ab — > bba} is not match-bounded for E* because it ad- 
mits derivations a n b — »• 2 r 1 b 2 " a n of exponential lengths. Another proof can be given by 
Proposition 4. One shows by induction that, for k > 0, 
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!• <**&* ^ match (n) b k+i a k+i for i > 0, and 

2. aohhbj . . . bf _1 ^tcHR) Mi&| ■ ■ ■ Ci°fc+i- 


So a™b 0 , m > 1, rewrites to a string that contains the factor a m _i . . . a]b\\ 

b 0 ^match(R) ® 0 b^tt\ ^match(R) b l b l b 2 • ■ • b m— 1 1 • • • 


Hence W C raised (A, W) for W = a + 6. Note that this set of witnesses is regular, but 
the given derivations (verifying the witnesses) are not globally match-bounded. On the 
other hand, we can have match-bounded verification for the non-regular set of witnesses 
W' = {ab 2n ab 2 " 1 . . ,ab\ n G N}, since W' C raisedi(A, W'). 

Looping string rewriting systems form a particular subclass of the class of all non- 
terminating systems. A loop is a derivation of the form s — psq for strings s,p, q E £* . As 
it turns out, the existence of a loop can be characterized in terms of finite sets of witnesses, 
as follows. 


Proposition 5. A string rewriting system R admits a loop if and only if there is a finite, 
non-empty set W such that W C raised(i?, W). 

Proof. If R admits a loop then it also admits a loop s psq during which every position 
between letters is touched [14]. So lift 0 (s) — >+ a t c h(.R) V >s 'q' for some s' > lift^s). The 
claim holds with W = {s}. Conversely, let W C raised (R, W). Then for every k > 0 
there is a sequence wq,wi, . . . ,Wk such that -uy+i G raised(i?, {wy}) for 0 < i < k, thus 
Wj G raised(i?, {wy}) for 0 < i < j < k. For k = |W|, by the pigeonhole principle, there are 
i < j such that w t = Wj. Hence w t = Wj G factor(i? + ({wj})) forms the desired loop. □ 

The converse of Propositon 4 is open: 

Problem 1. Does every string rewriting system R such that e f lhs(i?) that is not match- 
bounded have a non-empty set W C raised(i?, W)1 

If the stronger statement “. . . have some c G N and a non-empty regular set W C 
raised c (i?, W)” holds then match-boundedness is decidable: One can simultaneously enu- 
merate these certificates (c, W) along with certificates for match-boundedness (according to 
Theorem 4). Example 13 seems to indicate that the stronger statement is false. So the 
following remains open: 

Problem 2. Is match-boundedness decidable? 

6 MATCH-BOUNDS FOR FORWARD CLOSURES 

We have shown that match-boundedness for L is a criterion for termination on L. To prove 
termination on E* however, the obvious choice L = E* may be too restrictive as it even 
entails linear derivational complexity. We are going to show that the set of right hand sides 
of forward closures [20, 7] is a better choice for L. For a string rewriting system R over E, 
the set of forward closures FC (R) C E* x E* is defined as the least set containing R such 
that 


• if (u,v) G FC (R) and v w, then (u,w) G FC(i?) ( inside reduction ), and 
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• if (u,v£ i) G FC (R) and (£\£ 2 — > r) G R for strings £\ ^ e, £ 2 7 ^ e, then (■ u£ 2 ,vr ) G 
FC{R) ( right extension). 

Let RFC(-R) denote the set of right hand sides of forward closures. Equivalently, RFC (if) is 
the least subset of E* containing rhs(i?) such that 

• if v G RFC(-R) and v — w, then w G RFC(R), and 

• if v£\ G RFC(i?) and (£±£2 — >■ r) G R for £1 ^ e, £ 2 ^ e, then vr G RFC ( R) . 

Theorem 5 ([6]). A string rewriting system R is terminating on E* if and only if R is 
terminating on RFC(R). 

Theorem 2 for L — RFC (7?) yields: 

Corollary 3. Every string rewriting system R that is match-bounded for RFC(R) is termi- 
nating. 

Example 14. The system R = {aa — > aba} (cf. Example 8) is match-bounded for RFC(R) 
by 0 since the set RFC ( R) = ( ab) + a consists of strings in normal form. Therefore, R is 
terminating. 

We can obtain RFC(-R) as a set of descendants modulo the rewriting system R# = 
R U {7i# — 1 > t | (£\£ 2 — > r) GE,f 1 ^ e, £ 2 7^ e} over alphabet E U {#}, where right extension 
is simulated via the new end-marker ff E. Indeed, 

RFC(R) = R* # (rhs(R) • #*) C E* 

is an immediate consequence of the following equality. 

Lemma 4. Let R be a string rewriting system over E, where ff E. Then RFC(-R) • ff* = 
i^(rhs(R)-#*). 

Proof. Show the inclusion from left to right by induction over the definition of RFC(R). 
Conversely, i?T(rhs(i?) • ff*) C RFC(i?) • ff* is shown by induction over n. □ 

Definition 2. The string rewriting system R is RFC-match-bounded if R# is match-bounded 
for rhs(.R) • ff*. 

Recall that R # is match-bounded for rhs(i?) • ff* if and only if R# is match-bounded for 
i^(rhs(R)-#*). 

Corollary 4. If a string rewriting system R is RFC -match-bounded, then the language 
RFC(.R) is regular. 

Lemma 5. If a string rewriting system. R is RFC-match-bounded, then R is match-bounded 
for RFC(R). 

Proof. If R# is match-bounded for i?^(rhs(R) ■ #*), then R is match-bounded for RFC(R) 
as R C R# and, by Lemma 4, RFC(R) C R#(rhs(i?) • #*). □ 
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However, RFC-match-boundedness and match-boundedness for RFC ( K) are not equiva- 
lent, see Example 20 for a counterexample. 

Combining the previous lemma with Theorems 2 and 5, we obtain the following termi- 
nation criterion. 

Theorem 6. Every RFC-match-bounded string rewriting system is terminating. 

Example 15. Zantema’s system Z = {a 2 b 2 — » b 3 a 3 } from Example 9 is RFC-match-bounded 
by 4, as the following finite automaton accepts the language match 5 ( Z# )* (lift 0 (r hs ( Z) • #*)). 



This automaton is a certificate for termination (cf. the remark after Theorem 4). 

Example 16. The system R 2 = {ab — » ba} from Example 6 is RFC-match-bounded by 1 
since niatch(£> 2# )*(lift 0 (rhs(.E> 2 ) • #*)) = b 0 (a 0 U b+a i)#q- ^ is n °t match-bounded, see 
Example 10. 

Example 17. The system R = {ab — > bba} is RFC-match-bounded by 1. Here, as can easily 
be seen, match(i? # )*(lift 0 (rhs(i?) • #*)) = &o( a o U (^i) +a i)#o- Again, this system is not 
match-bounded, see Example 13. 

The Examples 16 and 17 show that RFC-match-bounded systems, unlike match-bounded 
systems, may have non-linear derivational complexities. We do not know of an RFC-match- 
bounded system with longer than exponential derivations. 

Example 18. The bubble-sort system over a three-letter alphabet, B 3 = {ab — > ba,ac — > 
ca,bc — > cb}, is not match-bounded for RFC(i? 3 ), and hence not RFC-match-bounded. To 
prove it, check b + c + a C RFC(R 3 ), and observe that {be — » cb} C B 3 is not match-bounded 
for b + c + , cf. Example 6. In contrast, all proper subsystems of B 3 are RFC-match-bounded 
by 1. 

Example 19. For R = {ab — ■> baa}, we have RFC(R) fl b*a* = {b n a 2n \n > 1} ^ REG. By 
Corollary 4, R is not RFC-match-bounded, in contrast to Example 17. This shows that the 
class of RFC-match-bounded systems is not closed under reversal, i.e. , under the operation 
R t— > (rev(£) — > rev(r) j (£ — > r) e i?}, where rev(aia 2 . . . a n ) = a n . . . a 2 ai for a* G E. (Note 
that the class of terminating systems is trivially closed under reversal.) 
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7 RFC-MATCH-BOUNDEDNESS AND RELATED CONDITIONS 


As a sufficient condition for termination of a string rewriting system A, we introduced match- 
boundedness of R for RFC(A). In order to construct RFC(A), we used the enriched system 
A#. This system contains additional rules that subtly influence match heights, as indicated 
in this section. 


Example 20. Here, we will present an example demonstrating that the inverse of Lemma 5 
does not hold true. We claim that the string rewriting system R over alphabet {a, b, c, d, e} 
with rules 

{a — > b, b — > cd, de — > a, cb — » a} 

is match-bounded for RFC(A), but not RFC-match-bounded. 

Claim 1: R is match-bounded for RFC (A). Indeed, it is straightforward to verify that R 
is match-bounded by 3 for RFC(A) = c*a, U c*b U c + d. 

Claim 2: R is not RFC-match-bounded. This is a direct consequence of the fact that, 
for z 6 {0, 1} and for any n > 1, 


n 2 n — 1 * 

a zW o ~ > match(R # ) °2n+l- 

The proof is by induction on n. We have a z ff o — ► ^+i#o — 1 > c z + 2 d z + 2 #o — ► c 2+ 2 ai 
c z +2&2 — > a 3 for n — 1 , and for n > 1 we obtain 


_//2 n — 1 
a zW 0 


C2n+l a lH z 0 


2 n-i_i 


C2n+l«2n-l ' 


► hn.n " 

C2n+\b2n 


C2n+1^2n+l#0 

®2n+l ) 


the induction hypothesis being applied twice. Throughout, rewriting is modulo match (A#). 
Example 21. Even if a string rewriting system R is both match-bounded for RFC(A) and 
RFC-match-bounded, the corresponding least match-bounds may differ by any given number 
k > 0. This is shown for the system 


R = {a.i- 1 — > a*, bi- 1 — > bi | 1 < i < k} U {a k - 1 — > cd, de — > 6 0 , c6fc_i — > a 0 } 

over alphabet {«o, , a k ~ i, bo, , b k ~ i, c, d, e}. As is easily seen, R is match-bounded for 

RFC (A) by k + 1, whereas A# is match-bounded for rhs(A) • #* by 2k + 1. So the difference 
between these bounds is indeed k. 

For completeness’ sake we also mention a sufficient criterion for RFC-match-boundedness. 
We will use the set of left hand sides of forward closures of a rewriting system A, denoted 
by LFC(A). 

We remark that computation of LFC(A) seems to require the construction of the full set 
FC(A), a step that could be avoided for RFC (A). 

Proposition 6. If a string rewriting system A is match-bounded for LFC(A) by c then A is 
RFC-match-bounded by c. 

Proof. For any step that uses a rule l\ff — > r, it is possible to reconstruct some string £ 2 
with Ci< 2 , —■ ► r in A that ff represents. This transformation preserves match heights. □ 


Example 22. The least RFC-match-bound of A = {aa — » aba} from Example 8 is 1. The 
least match-bound of A for LFC(A) = aa + , however, is 2. 

Example 23. The system A = {aba — > a, ab — > ba } is RFC-match-bounded by 1, but A is 
not match-bounded for a(ba) + C LFC(A). 
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8 IMPLEMENTING MATCH-BOUNDS: MATCHBOX 


We have implemented the algorithms presented in this paper (Theorems 4 and 6) in a pro- 
gram called Matchbox. It can be accessed via a CGI- interface at http : //theol . inf ormatik . 
uni-leipzig.de/matchbox/, its Haskell source is available. 

The program fared quite well in the recent “termination competition” held at the 6th 
International Workshop on Termination (WST 2003) at Valencia, Spain. Unlike its competi- 
tors, however, Matchbox only addresses string rewriting. 

In particular, Matchbox is able to prove termination for a large number of one-rule string 
rewrite systems for which all standard automated methods (like path orderings and poly- 
nomial interpretations) fail, and for which only complicated ad-hoc proofs were known, if 
any. The list below contains those one-rule systems that are left from an attempt to classify 
termination of all (approx. 6.7 • 10 9 ) one-rule systems {t — > r} where \i\ < jr| < 9. They 
cannot be solved by any known method [11]. 


{abaab baabbaa}, 
{aabaaab — > baaabbaaa } , 
{baabba —> aabbaaabb } , 
{aabaaba — > abaabaaab} , 


{babbaa — > abbaabba}, 
{ababaab — > baabbabaa } , 
{caabca — > aabccaabc } , 
{abaab — > baabbaaba} . 


Matchbox yields proofs that all these systems are RFC-match-bounded by 2. 

9 A COMPARISON TO THE TERMINATION HIERARCHY 

We have shown that for string rewriting systems R the following implications are valid: 


match-bounded =4* 
match-bounded for LFC(i?) =>- 
RFC-match-bounded =r- 
match-bounded for RFC (7?) 

terminating 

None of these implications can be reversed: The system {ab — > ba} from Example 10 is 
match-bounded for LFC(-R) = ab + , but not match-bounded. Example 23 is RFC-match- 
bounded but not match-bounded for LFC(-R). Example 20 contains a counterexample to the 
converse of the third implication. Finally, the system B 3 from Example 18 is terminating 
though not match-bounded for RFC(I? 3 ). It is interesting to compare these results with 
Zantema’s termination hierarchy [30, 31]: 

polynomially terminating =4* ^-terminating totally terminating 

simply terminating non-self-embedding terminating 

As it turns out, this hierarchy is orthogonal to all four properties mentioned above. Indeed, 
B 3 = {ab — > ba, ac — > ca, be — > cb} is polynomially terminating (choose n 1 — ► 3n + 1, 
n 1 — > 2n + 1 and n n + 1 as interpretation for a, b and c respectively), but not match- 
bounded for RFC(-R). And {aa aba} is a non-self- embedding system that is nevertheless 
match-bounded. 
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10 CONCLUSION 

If the flow of information during rewriting is suitably restricted, some desirable properties 
hold: termination, bounded derivational complexity, or preservation of regular languages. 
For instance, McNaughton [21] and independently Ferreira and Zantema [9] use extra letters 
to indicate the absence of information flow through certain positions. Kobayashi et al. [18] 
restrict derivations by using markers for the start and the end of a redex. Senizergues [27] 
constructs finite automata to solve the termination problem for certain one- rule string rewrit- 
ing systems. Moczydlowski and Geser [22, 23] restrict the way the right hand side of a rule 
may be consumed in order to simulate the rewrite relation by the computation of a pushdown 
automaton. 

With our concepts of deleting and match-bounded string rewriting, we aim at extending 
these approaches to a systematic theory of termination by language properties. Regularity 
preservation forms a basis for automated termination proofs. We present two variants to 
demonstrate some of the potential of this new approach. Match-boundedness on the set 
of all strings over the given alphabet is easiest to conceive. On the other hand, match- 
boundedness on more restricted sets, for instance the right hand sides of forward closures, 
may significantly enlarge the application domain. Each method can solve hard examples, 
like Zantema’s system. 

We expect these powerful criteria to enable some major progress in the decision problem 
of uniform termination of one-rule string rewriting systems, a problem open for 13 years [19] 
(see also [26, Problem 21]). Our hope is supported by the fact that some hard one-rule 
systems can now be proven terminating automatically. 

Single-player games like Peg Solitaire can be analyzed through the construction of reach- 
ability sets. It is challenging to extend this approach to two-player rewriting games [29]. 
Interesting properties are termination, which is necessary for a well-defined game, or regu- 
larity of winning sets. Even the impartial case is hard; here the central question is whether 
Grundy values are bounded. 

It seems natural to carry over the notion of match-boundedness to term rewriting, in 
order to obtain both closure properties and new automated termination proof methods. 
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